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Abstract 

Despite the importance of sparsity signal models and the increasing prevalence of high-dimensional streaming 
data, there are relatively few algorithms for dynamic hltering of time-varying sparse signals. Of the existing 
algorithms, fewer still provide strong performance guarantees. This paper examines two algorithms for dynamic 
hltering of sparse signals that are based on efficient £i optimization methods. We hrst present an analysis for one 
simple algorithm (BPDN-DF) that works well when the system dynamics are known exactly. We then introduce a 
novel second algorithm (RWLl-DF) that is more computationally complex than BPDN-DF but performs better in 
practice, especially in the case where the system dynamics model is inaccurate. Robustness to model inaccuracy is 
achieved by using a hierarchical probabilistic data model and propagating higher-order statistics from the previous 
estimate (akin to Kalman hltering) in the sparse inference process. We demonstrate the properties of these algorithms 
on both simulated data as well as natural video sequences. Taken together, the algorithms presented in this paper 
represent the hrst strong performance analysis of dynamic hltering algorithms for time-varying sparse signals as 
well as state-of-the-art performance in this emerging application. 

I. Introduction 

Many signal processing applications involve performing inference on high-dimensional time-varying signals. The 
past decade of work in signal and image processing has demonstrated that signal priors based on the notion of 
sparsity (e.g., compressihility in a basis) lead to state-of-the-art performance in many linear inverse problems (e.g., 
undersampling, inpainting, denoising, compressed sensing, etc. Q-Q) across several different application domains 
(e.g. natural images Q, audio Q, hyperspectral imagery Q, etc.). In this paradigm, a static signal x is observed 
through a linear measurement process y = ^x + e, and the signal is estimated through performing penalized 
least-squares with a sparsity-inducing regularization function. 
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With the recent increase in streaming data sources, it is critical to develop algorithms that can operate efficiently 
on these data streams. In dynamic filtering, a causal algorithm estimates the current system state recursively based 
on the previous state estimate (as opposed to hatch methods which jointly estimate the time-varying states once 
all data is collected). Classic dynamic filtering approaches stem from Kalman filtering Q, which has restrictive 
assumptions (linear systems and Gaussian distrihutions) and involves large matrix inversion that is inappropriate for 
high-dimensional data. Given the historical success of dynamic filtering and recent achievements of sparsity models, 
there are many applications where one can reasonably expect significant gains from jointly leveraging sparsity and 
models of the signal dynamics. Unfortunately, compared to the individual literatures on sparse signal estimation and 
dynamic filtering, there is comparatively little work incorporating both types of information jointly. In particular. 


existing work (reviewed in detail in Section II-CI does not integrate the powerful ii optimization methods from the 
sparsity literature with the successful Kalman philosophy of using higher-order statistics to propagate information 
about past states to the current estimate. Furthermore, existing reports lack strong convergence and performance 
guarantees. 

In this paper we focus on the problem of estimating a time-varying state G that evolves according to a 
Markov process 


— fn (®n—l) “h ^nj (1) 

where Xn obeys a sparsity model at each time step and /n (•) : is the operator (assumed known) 

describing the system evolution with some error G (called the innovation). The desired state is hidden and 
only observed through noisy linear measurements y G 


Vn — GnXji e^ii (2) 

where Gn G is the (known but potentially time-varying) measurement matri}£]that has error e„ G and 

may have M N. 

In this paper we examine two methods for £i-based dynamic filtering of sparse signals. We first consider the case 
where the system dynamics are known with high accuracy. For this case we present a simple and efficient algorithm 
(BPDN-DF) that combines dynamic penalization with optimization and we present analytic accuracy guarantees 
on its performance. We then introduce a novel second algorithm (RWLl-DF) that performs better in practice at the 
cost of a marginally higher computational complexity than BPDN-DF. In particular, RWLl-DF continues to perform 
well when the system dynamics model is highly inaccurate. RWLl-DF achieves robustness to model inaccuracy via 

'For static problems we will use as the observation matrix and for dynamic problems we will use G in order to better differentiate the 
two cases. 
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a hierarchical prohahilistic data model and propagation of higher-order statistics from the previous estimate in the 
sparse inference process, much as in traditional Kalman filtering. We demonstrate the properties of these algorithms 
both in extensive simulations as well as on natural video sequences. In total, we present here both the first strong 
performance analysis of causal dynamic filtering for temporally evolving sparse signals, as well as state-of-the-art 
performance in this emerging application. 


II. Background and Related Work 


A. Sparse Signal Estimation 

In the sparsity model, we assume the signal of interest can be written x = Wz, where the coefficient vector 
2 : G is mostly composed of zeros and the dictionary W may be orthonormal or overcomplete. In the Bayesian 
interpretation, this model assumes a prior distribution on z which has high kurtosis to encourage coefficients to 
be zero (or close to zero). Most commonly, an independent, identically distributed (IID) Laplacian distribution is 
used as the coefficient prior p{z) = where 7 is related to the variance of the individual coefficients and 

||z||^ = Ylii l^[*]| indicates the ii norm. Under a Gaussian noise assumption on the measurements y = ^x + e, the 
corresponding MAP estimate of the signal reduces to the Basis Pursuit De-Noising (BPDN) optimization program 


z = argmin 

Z 


1 ,, ,,2 ,, ,, ' 

^\\y - $vrz ||2 -hTlI^lli 


(3) 


where al is the measurement noise variance and the signal estimate is ® = Wz. BPDN has been a particularly 
popular approach due to its strong performance guarantees 0 and the development of many specialized optimization 
approaches 0-|13|. As a guarantee of the measurement quality, we say that $ satisfies fhe resfricfed isometry 
property with parameters 2S and 6 (RIP(2S',5)) with respect to W if for every 25-sparse vector z we have that 


C{l-5)\\z\\l < ||$PUz||2 < G(l + <5) ||z||2 


(4) 


If z has only S non-zeros and $ satisfies RIP(25',(5) for an appropriafe value of 6, fhen fhe BPDN esfimafion error 
can be shown to be bounded as 


I II II ^ ~ ,, 

|z - 2:||2 < Cl ||e||2 -h ||z - 2s||^, 


(5) 


where Ci and C 2 are constants and Z 5 is the best S'-term approximation to z |[l^-| 161. The RIP condition serves 
to quantify the quality of the measurements by bounding how similar measurements of different signals could be. 
The choice of basis W is particularly important in ensuring the RIP holds. For orthonormal W, simply ensuring 
that the dot products between the rows of $ and the columns of W are small (i.e. incoherent) is sufficient. In the 
case of overcomplete dictionaries, the inner products between the columns of W must also be small, a property 
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that can still be found in tight frames. 

In Q, the variance parameter 7 is assumed known and identical for each coefficient. The RWLl algorithm p7| 
improves performance by generalizing these assumptions through a hierarchical probabilistic model known as the 
Laplacian scale mixture (LSM) 1181. The LSM introduces a second layer of random variables A G representing 


the variance of the first level Laplacian coefficients z, allowing a variable SNR on each coefficient that is also 
inferred from the data. The LSM uses the Gamma distribution (a conjugate prior for the Laplacian) as the hyperprior 
on A[z] 

A^-ir 


p(A[i]) = 




(a) 

where T (•) is the Gamma function and a and 6 are hyper-parameters with £^[A[f]] = a9 and Var[A[f]] = 

Assuming a = 1 and a Laplacian conditional distribution on the coefficients p(z[f]|A[f]) = the 

MAP estimate for the full model can be written as 


|z, a| = argmin \\y — $VLz ||2 + ^ |A[f]z[f]| 

i 

- log [ ^ A[i] ) + 0-1 ^ A[i]. 


Though this optimization program is difficult to optimize directly, employing expectation-maximization (EM) 1181 


results in the RWLl algorithm p7| where the following equations are iterated until a convergence criteria is met: 


z = 


argmin \\y — $LLz ||2 -|- Aq ^ |Ai[f]z[i] 


A*+i[A:] = 


/3 


|zi[fc]|+7’ 

with t indicating the iteration number and {Aq,/?, 7 } constant hyperprior parameters. Several papers have followed 


up on this RWLl approach, including a description of it as a compromise between £i and £q minimization 1191, [201 

1. Intuitively, coefficients with little current evidence will have 


and an analysis of performance guarantees | 211 - 
their variances decreased so that future iterations are more likely to estimate them close to zero, and coefficients 
with strong current evidence will have their variances increased to make future iterations more likely to use those 
coefficients. 


B. Dynamic Signal Estimation 

In dynamic filtering, knowledge of the system dynamics is used to set a prior probability distribution on the current 
estimate. In the classic Kalman filter setup, the linear and Gaussian assumptions (including the system dynamics 
fk {x) = Fkx) translates to a Gaussian assumption on the innovations ^ M (0, Q) and on the distribution of the 
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(b) Unscented Kalman Filter 




Fig. 1. Information propagation in dynamic filtering algorithms, (a) Classic Kalman filtering propagates the current estimated mean and 
covariance (blue) through the system dynamics to generate a prior distribution (red) for the next state estimate. This prior is used in conjunction 
with the likelihood function (green) from the measurements to form a posterior (black) used in estimation, (b) The unscented Kalman filter 
(and particle filtering generally) estimate the prior distribution via a sampling process that empirically approximates the distribution, (c) 
BPDN-DF adds a sparsity inducing term to the prior, resulting in a combined prior that is curved outward reminiscent of an f '2 ball, (d) 
RWLl-DF uses the previous estimate to set the parameters of a prior that has the diamond-like shape known to promote sparse solutions. 


current state conditioned on the previous state p{xk\xk-i) ~ M {F^Xk-i, Q). Similarly, assuming ek ^ M (0, R), 
the likelihood on the measurements at time k conditioned on the current state is p{yk\xk) M {GkXk,R). 
Using the Markov property of the state distributions we can write the current state’s marginal distribution as 
p{xn) ~ N {xn-iiFnPn-iF^ + Q)^ where Pn is the covariance matrix of the previous state estimate. The 
Kalman filter infers the current state via a MAP estimate 


Xr 


arg min 

Xn 

+ 1 


llf/n GnXfi 

FnXn—l 


||2 

\\r,2 

||2 

IIf„P„_iFJ+Q,2 > 


( 6 ) 


which has an analytic solution (known as the Kalman update equations Q). The causal estimation procedure is 
depicted in Figure [T]^a). Intuitively, the estimator propagates the previous state estimate (and its covariance matrix) 
through the system dynamics to get a prediction (i.e., a prior) on the current state. This prior is combined with the 
likelihood resulting from the current measurement to generate a posterior distribution, which is used to estimate 
the current state. Though the causal estimator is computationally simpler than joint estimation of the states, it still 
calculates the same estimate for Xn as if all of the previous data had been used due to the information propagation 
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at each iteration via the covariance matrices. 

Unfortunately, the analytic simplicity and optimality guarantees of the Kalman filter are highly dependent on 
the linear and Gaussian model assumptions. Though not optimal estimators, many heuristic approaches have been 
introduced that follow the spirit of the Kalman filter while incorporating nonlinear system dynamics (e.g.. Extended 


Kalman Filter [241) or non-Gaussian structure. An alternate approach for highly non-linear functions or non-Gaussian 


statistics is particle filtering, which uses discrete points (particles) to approximate the relevant distributions and to 
propagate those distributions through nonlinear dynamics functions. The Unscented Kalman filter | [25| (depicted in 
Figure [T]^b)) is similar but uses deterministic (rather than Monte-Carlo) particles. Though particle filtering approaches 
are general, these methods can become intractable in high-dimensional state spaces and do not explicitly utilize 
information about sparsity structure. 


C. Sparsity in Dynamic Signals 

Recent work has begun to address time-varying sparse signal estimation. One method to leverage both temporal 


correlations and sparsity is to take a batch processing approach (e.g. |26|-|32|). Batch processing is useful in 

a number of application, in particular when measurements can be amassed over a time period and analyzed 
simultaneously. While batch processing can achieve lower estimation errors and permit greater freedom in cost 
function design, these methods cannot be run causally, excluding them, for instance, from real-time applications. 
Additionally, operating on larger batches of measurements may require increased computational complexity (e.g. 
memory or computational time) than online methods. Thus, batch processing techniques are significantly different 
from our focus on causal (online) estimation. Recent work has also addressed sparse state estimation via efficient 


updates of previous solutions using specific opfimizafion algorithms, such as homotopy methods |33|, |34| or 


iterative thresholding |16|. While utilizing the dynamic information implicitly to reduce computational complexity, 
these approaches do not attempt to improve the estimator quality through modeling signal dynamics. 

Another branch of previous work essentially modifies fhe Kalman filler equalions to induce sparsity in the 
estimate. One approach in this vein employs a pseudo-norm in the Kalman update equations to encourage sparser 


solutions 1351 and then enforces an ii constraint on the state after the Kalman update. Another approach |36|, |37| 


works in two-steps, first performing support estimation using an £i cost function and then running the traditional 


Kalman equations on a restricted support set. While the work in 1361, 1371 assumes sparse innovations, the robustness 


to model mismatch has not been fully explored. As will be shown in Section these approaches appear to lack 
robustness to the innovations statistics, perhaps due to the support set estimation being particularly sensitive to 
estimation errors. From a computational perspective, storing and inverting full covariance matrices is also prohibitive 
for the high-dimensional data where sparsity models have been most successful. 
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More recent approaches have considered direct coefficient transition modeling via Markov models |38|, |39|, 


either hy utilizing message passing to propagate support information through time |39| or hy using the previous 
estimate to influence coefficient selection through modified orthogonal matching pursuit ||38|. These approaches 


do not rely on solving complex optimization problems and so are extremely computationally efficient. However, 
these approaches either restrict the dynamics model to specific applications | [39t (i.e. the approach as specified and 
implemented restricts the dynamics function to f{x) = x) or have limited robustness due to the fact that they 


strictly enforce a support set estimate |38|. Additionally, they currently lack strong accuracy guarantees. 


In another direction, some proposed algorithms (e.g. 140|-|44|) essentially modify the BPDN objective function 
to include a term for the dynamic state prediction, including the BPDN-DF algorithm analyzed in the next section. 
Such approaches are simple and can work well, but lack robustness to model mismatch due to their explicit strong 
assumptions on the innovations statistics (see the discussion in |[43|). Until now, these approaches have also lacked 
strong accuracy guarantees. 


III. Basis Pursuit De-Noising with Dynamic Filtering 


The first algorithm we explore is basis-pursuit de-noising dynamic filtering (BPDN-DF) |421, |431 (similar to |271, 
j), which is a simple and efficient method that can work well when the dynamics are known. BPDN-DF modifies 
the BPDN estimator to include a term from the Kalman filter optimization that enforces consistency with the model 
prediction from the previous state estimate 


Zn = argmin ll^ri - zW^ + 7 H^l 


+k\\Wz - f (Xn- 


IJII2 > 


(V) 


where 7 and k are tradeoff parameters and the state is recovered via Xn = Wz^- Thus BPDN-DF seeks a sparse 
vector that matches both the measurements and the causal prediction obtained via the known dynamics model. As 
depicted in Figure [^c), BPDN-DF combines the penalty for deviations from the dynamic prediction / with 

the typical measurement likelihood and the sparsity penalties from BPDN to obtain an overall cost function for 
estimating the current state. 

BPDN-DF has shown empirical performance improvements over standard Kalman filters and independent BPDN 


estimation at each time-step (i.e., using BPDN at each time step without temporal regularization) |43|. Additionally, 
the algorithm is simple and requires a similar computational complexity as standard BPDN, making it appropriate for 
high-dimensional data. However, while BPDN-DF does pass first-order information forward from the previous state 
to the current estimator, this procedure creates a combined regularizer with a “rounded” shape in some directions 
(illustrated in Figure [TJc)). This bulging of the combined prior starts to resemble the £2 ball, and is in conflict with 












the geometric structures known to produce sparse solutions (e.g., the hall) Q, [451. This can cause robustness 
problems in BPDN-DF when there are inaccuracies in the dynamics model (demonstrated later). 

Aware of these potential drawbacks, analytic performance guarantees would be valuable because BPDN-DF 
remains an effective and computationally attractive algorithm for some applications. Prior work in | |44| provided 
guarantees on the statistical regret of an estimator similar to BPDN-DF called the dynamic mirror descent (DMD). 
Although the DMD is more general in terms of its formulation, the guarantees it provides (i.e., bounded regret) 
are in terms of a comparison with an optimal non-dynamical cost function and are therefore difficult to directly 
relate to the cost function in Equation While our preliminary analysis of BPDN-DF using simple techniques 
provided some accuracy guarantees, these bounds were too loose to be an accurate reflection of the estimator 


performance |W. In this work we adopt a proof technique similar to |16|, |47| which provides much stronger 
accuracy guarantees for BPDN-DF, as summarized in the following theorem. 


Theorem Suppose that at each time n, Gn satisfies RIP(S + 2q,6) for some constant q > 0, and the error 

and innovations satisfy ||en|l 2 — ^ Il*^n|l 2 Furthermore, suppose that for all Xi, X 2 , the dynamics satisfies 
11 / (xi) — f (* 2)112 < f* ||®i ~ * 2112 /'^^ universal constant f*, and the coefficients at each n satisfy < b. 

If y > 0, K > 0 are known constants and the following condition is met: 


AC (( 1 -r)6-(l +5)7^7-^)> 

(1 -f + \/l + 5e - (1 - (5) b, 


then the solution to Equation 0 satisfies 


— z 


n\ 12 


<||eo||2 + (1 - n (CiV? + ^26 + C 3 F) 


where the constants are defined as: 

/? 

Cl 

C 2 

C3 


1 + K — 5 
(1 -I- (5)(1 -I- /c)7 
1 -|- AC — (5 — Ac/* 

y/m 

1 + K — 5 — Ac/* 
AC 

1 + K — 5 — Ac/* 


( 8 ) 


The full proof of Theorem III.l is outlined in Appendix VII-A| Note that the error 
components: a transient term depending on the initial uncertainty that decays if /3” —)• 0 , 
that emerges depending on the inherent difficulty of the problem (i.e., the noise sources 


bound has two distinct 
and a steady state term 
and the sparsity of the 
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Fig. 2. The RWLl-DF algorithm inserts the dynamic information at the second layer of the LSM model for each time step. The graphical 
model depicts the model dependencies, where prior state estimates are used to set the hyperpriors for the second level variables controlling 
the variances (i.e., SNRs) of the state estimates at the next time step. 


signal). If K = 0 (removing the dynamics in the estimation), we obtain exactly the results in 1161 for solving BPDN 


with no dynamic filtering term. Alternatively, as 7 —)■ 0, the result in Theorem III. 1 ceases to hold as convergence 


is no longer guaranteed. Specifically, 7 = 0 violates a condition in Lemma VII. 1 which is required to prove 


algorithmic convergence. We note that the same condition in Lemma VII. 1 necessitates the condition || 2;„||2 < b, 
indicating that b ^ 00 can also cause a loss of convergence, and thus accuracy, guarantees. 

Theorem |III. 1 1 implies that BPDN-DF is only guaranteed to converge if /3 < 1. Since most parameters are system 
or signal dependent (and therefore not controllahle hy the user), we interpret this as a condition on the parameters 
7 and K. In particular, since 7 does not appear in the expression for (3, we can consider this to he a hound on k, 

1-6 


K < 




if r>L 


(9) 


This condition essentially compares the smoothness of the dynamics function with the RIP conditioning of the 
measurements. For example, as 6 approaches 1, the allowable range of k shrinks indicating that the dynamics 
should be deemphasized in BPDN-DF. Alternatively, as f* approaches 1, the range of k increases, indicating that 
the dynamics can be emphasized in the optimization cost. Interestingly, if /* < 1, the requirement that /3 < 1 
induces no additional restrictions on k as the numerator of Equation Q is positive. With a negative denominator 
in Equation Q, this condition is redundant with the prior requirement that k is positive. The steady-state error 
bound depends on both the maximum measurement error energy e and the maximum innovations energy F. The 
parameter k trades off between the two, where the trade-off takes into account the RIP conditioning 6 as well as 
the dynamics smoothness f*. 
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IV. Re-Weighted li Dynamic Filtering 

Despite now having provable guarantees, performance of BPDN-DF degrades with increased inaccuracy in the 


dynamics model. In fact, Theorem III. 1 requires that the innovations (a measure of the dynamics inaccuracy) satisfies 
certain inequalities with respect to the measurement noise and sparsity. In this section we present a novel algorithm 
called re-weighted dynamic filtering (RWLl-DF) that uses a richer statistical model to improve performance, 
especially when the dynamics model is inaccurate. In contrast to existing algorithms (including BPDN-DF), RWLl- 
DF displays a combination of flexibility (allowing arbitrary dynamic models), computational efficiency (leveraging 
advances in minimizafion), and robustness to dynamic model errors caused by sparse innovations. The main 
technical innovation we propose is to use the hyper-priors in the LSM model to incorporate a dynamic prediction into 
second-order statistics for the next estimate in a causal fashion (akin to Kalman filtering). The resulting estimation 
procedure is illustrated in Figure [TJd) and the graphical model is depicted in Figure While some existing 
literature pT[, |[48|-p0| has proposed re-weighted estimation schemes to include prior information in a limited 


way (e.g., using only binary weights based on support set estimates), RWLl-DF modifies RWLl specifically for 
dynamic filfering and uses a general probabilisfic model fhaf improves robusfness by allowing arbifrary weighfings. 

Technically, similar to the LSM model, our proposed model describes the conditional distribution on the sparse 
coefficients 2 at time n as a zero-mean Laplacian with different variances: 


We also set the scale variables controlling the coefficient variance to be Gamma distributed: 


P{>^n[i]) 


e-\i]Tia) 


and allow each of these variables to have a different expected value, {a6n[i] = F^[An[i]]) by modifying 0n[i\- To 
introduce dynamic information, the expected value of each scale variable is set based on the previous state estimate. 
In particular, if the model prediction of Zn[i] based on the previous state is large (or small), the variance of that 
coefficient at the current estimate is made large (or small) by making A„[i] small (or large). Large variances allow 
the model flexibility to choose from a wide-range of non-zero values for coefficients that are likely to be active and 
small variances (with a mean of zero) encourage the model to make such coefficients inactive. By “encouraging” 
the model through the use of second order statistics (instead of forcing the model to use an estimated coefficient 
subset) the model remains robust and flexible. While there are many possible choices for the mapping of predictions 
into the model at the next time step, in this paper we choose 


en[i\=i{\W-^fn{WZn-l)\i]\+v) , 
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where W~^fn (Wzn-i) [i] is the coefficient of the previous signal propagated through the dynamics, r/ is a 
linear offset and ^ is a multiplicative constant. Note that any general model for the dynamics is allowable and we 
are not restricted to linear dynamical systems in the prediction. Note also that the absolute values are necessary 
because A determines a variance, which must be strictly positive. The parameter r] determines the distribution of 
the variance when the coefficient is predicted to be zero, resulting in W~^fn {Wzn-i) [i] = 0. This parameter 
reflects the magnitude of the innovations in the erroneous model predictions. 

Under this model, the joint MAP estimate of all model parameters becomes 

[zn,Xn] = aigmin \\yn - GnWz\\l +(3 y^\X[i]z[i]\ 

[z,A] Y 

-a^log(A[z]) (10) 

i 

+ J E [f]| + V). 

i 

As in the LSM case, the optimization in ( [T0| ) is not easily solved for both Zn and A„ jointly, but the model yields 
a simple form when using an EM approach. The precise steps of the iteration in this case are 


E step: A*„ = Ep,^x\zi) [A] 

M step: z^ = arg min — log 

Zn 

where t denotes the EM iteration number and Ep(^x\z„) ['] denotes the expectation with respect to the conditional 
distribution p(A|£„). We can write the maximization step as 


P 


■ 2)1 1A 


= arg 


mm\\yn - GWz\\l + Aq E |A*[*]2 [f] 


( 11 ) 


since the MAP optimization conditioned on the A parameters reduces to a weighted £i optimization. 

The conjugacy of the Gamma and Eaplacian distributions admit a simple closed-form solution for the expectation 


step in this model that is similar to the original ESM model 118|, 




2t 


(3\zi[i\\ + \W-^fniWZn-l)[i]\+V^ 


( 12 ) 


where r = (a + 1) ^ is a constant scaling value, P = AqC can be interpreted as a tradeoff between the measurement 
and the prediction, and the signal of interest can again be recovered via Xn = Wzn- The resulting procedure 
that iterates between Equations o and ( [T 2 I ) looks nearly identical to the static RWEl algorithm except that the 
denominator in the A update contains a term depending on the previous state. This term encourages smaller A values 
(i.e., higher variances) in the elements that are predicted to be highly active according to /„ (®„_i). This graduated 
encouragement of coefficients selected by the prediction allows the algorithm to perform especially well when the 
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states and innovations are sparse while retaining good performance when the innovations are denser. Furthermore, 
the simple form means that the recent work in l\ optimization methods can he leveraged for computationally 
efficient recursive updates, requiring only a small number of l\ optimizations at each time step. In particular, 
since no covariance matrix inversion is required and many modern l\ estimation methods require only matrix 
multiplication (and no inversion), this approach is also amenable to high-dimensional data analysis. 

Despite being highly nonlinear, RWLl-DF has demonstrable stability and convergence properties. First, the 
RWLl-DF update equations in ( [TT] ) and ( [T^ directly guarantee that the coefficient and variance estimates will be 
bounded. To see this for an arbitrary time step re, note that the variance estimates are within the range G 

(0,2T/r7] for each EM iteration. With bounded variances, the weighted BPDN optimization in ( [T^ will yield a 
solution where the output coefficients are also bounded. Second, existing guarantees for the EM algorithm guarantee 
that the EM iterations in the proposed algorithm (i.e., an estimate at a single time step) have coefficient differences 
that asymptotically converge to zero, limt_).oo ||^n “ = 0 |51|. Existing stronger convergence guarantees 


(from EM or from the very limited analysis on RWEl algorithms | [^ ) do not apply in this case because of 
the technical details of RWEl-DE. Despite the lack of stronger convergence guarantees, the numerical results in 
section [V] demonstrate that the algorithm converges in practice with just a few EM iterations. 


V. Simulation Results 

While the dynamic filtering approaches discussed here are general inference tools with many potential applica¬ 
tions, we focus our evaluation on compressed sensing (CS) recovery as an example test case. In all simulations 
we compare the performance with existing algorithms discussed in Section II-C| when possible, noting in each 
particular case where algorithms were unable to be evaluated because they are incompatible or computationally 
prohibitive. Standard Kalman filtering is not shown because it performs very poorly in these type of simulations (i.e., 
it doesn’t converge to a stable estimate with the sparse statistics of the applications we use) g3|. The performance 
of independent BPDN (Equation Q) and independent RWEl (Equations Q), each applied independently at each 
time step with no temporal information, are also shown to highlight the benefit of including dynamic information. 


A. Stylized tracking scenario 

To explore the performance and robustness of the algorithms in detail, we first perform inference on synthetic 
data that simulates a stylized tracking scenario. The use of synthetic data provides us with “ground truth” so we can 
make controlled variations of the data characteristics. In this setup, we generate an image with S non-zero moving 
pixels of various intensities that move with time and represent targets that must be tracked. The movement of these 
non-zero pixels is specified fo be constant motion, and the simulated dynamics includes a sparse innovations 
term (i.e., dynamic model error) that causes target motion to change in each time step for some percentage of the 
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pixels p. In other words, at every time step there is a prohahility p of each target abruptly changing directions 
to violate the dynamics model assumed hy the inference algorithms. This process simulates an innovation that is 
approximately 25p-sparse at every iteration, allowing us to evaluate the algorithm’s robustness to a type of model 
mismatch (i.e., shot noise) that is particularly challenging for Kalman filter techniques. 

In detail, we create 24x24 pixel videos {N = 24^) with 20 moving particles (S = 20). The vectors are observed 
with Gaussian measurement matrices (with normalized columns) that are independently drawn at each iteration, and 
we add Gaussian measurement noise with variance cig = 0.001. We vary the number of measurements to observe 
the reconstruction capability of the algorithm in highly under sampled regimes, but the number of measurements per 
time step is always constant within a trial. All simulations average the results of 40 independent runs and display 
reconstruction results as the relative mean-squared error (rMSE) for each frame, calculated as: 


|®fc ®A;||2 


(13) 


\x\ 


For independent BPDN, at each iteration we use the value A = O.SSfig. For independent RWFl we use Aq = 
0.0011, T - \ and v - 0.01. For BPDN-DF we use 7 = O.Scig and k - 0.0007/(p + 1). For RWFl-DF we use Aq = 
0.0011, T - I and u - 1 — 2p/S. These parameters were optimized using a manual parameter sweep. Furthermore, 
for comparison we also show the performance of an optimal oracle least-squares solution, where the support at 
each iteration is known {Gk becomes overdetermined). Note that this scenario is particularly challenging for many 
existing algorithms because of the arbitrary model dynamics (e.g., / I) that may vary with time. In particular, 

comparisons with the algorithms described in | |40| , | |4T| , |48|, |49| (or modifications of them to accommodate 
arbitrary dynamics) were attempted and are not shown here because they still performed significantly worse than 
static estimation (e.g., BPDN) even after extensive searching for good parameter settings. 

The iterative algorithms based on RWFl are stopped when the relative norm-squared difference between co¬ 
efficients at consecutive iterations ||z^ — falls below a specified fhreshold (we use 0 . 1 %), which 

fypically requires jusf a few (fypically 5-7) iferafions. Figure shows fhe relafive coefficienf change over EM 
iferafion, demonsfrafing fhaf RWFl (i.e. wifhouf dynamics) convergence occurs by 10 iferafions and RWFl-DF 
acfually converges fasfer due fo fhe improved performance from incorporafing dynamic informafion. Figure Qa) 
shows a single frial wifh M = 80 measuremenfs and 2Sp - 5 innovation errors af each fime sfep. Nofe fhaf 
RWFl-DF reaches subsfanfially lower sfeady-sfafe recovery error fhan BPDN-DF, illusfrafing fhe nef improvemenfs 
gained by using second-order sfafisfics in fhe esfimafion. Figure Qb) displays fhe resulfs of varying fhe number 
of measuremenfs while holding 2Sp = 5. While performance for all algorifhms becomes comparable for large 
numbers of measuremenfs, if is clear fhaf exploiting femporal informafion can mosf improve performance in fhe 
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2 4 6 8 10 

EM Iteration Number 


Fig. 3. Convergence of RWLl and RWLl-DF. The mean relative change (over 200 frames) in the coefficients is plotted, with the error 
bars indicating the maximum and minimum values. The relative norm difference of the coefficients in the RWLl and RWLl-DF algorithms 
falls quickly over the first 10 iterations. The dynamic information helps the RWLl-DF algorithm converge faster, requiring approximately 
5-7 iterations to converge. 



Fig. 4. Behavior of the RWLl-DF algorithm on synthetic data, (a) RWLl-DF converges to a lower mean rMSE than static sparse estimation 
or BPDN-DF. Shown for M = 80, W = 576, S = 20 and p = 0.25. (b) When sweeping the number of measurements M for = 576, S = 20, 
and p = 0.25, we observe that the performance improvement for RWLl-DF is especially distinct in the highly undersampled regime. Each 
point is the average steady state rMSE over 40 independent trials, (c) RWLl-DF is also more robust to model mismatch in the innovation 
statistics. Shown here for different innovations sparsity (2Sp] for M = 70, N = 576, and S = 20. Each data point is the result of averaging 
the steady-state rMSE over 40 independent trials. Note that when BPDN-DE starts to perform better (2Sp — 10), the innovations are actually 
half of the total support set and a Gaussian innovations model may be more accurate than a sparse innovation model. 


highly undersampled regime. In particular, RWLl-DF is able to sustain virtually the same steady-state rMSE 
down to much more aggressive levels of undersampling than BPDN-DF. Finally, we explore the robustness of 
each algorithm to model errors by fixing the number of measurements (M = 70) and varying the sparsity of the 
innovations 2Sp. Figure |^c) shows the results, illustrating that RWFl-DF uses the second-order statistics to sustain 
better performance than BPDN-DF when the innovations are sparse (i.e., shot noise). We note that when 2Sp > 8, 
the total number of model errors is 50% of the signal sparsity, indicating both that the innovation is dense, and 
that the innovations has as much, or more, energy than the signal. The combination of the innovations statistics 
mismatch (sparse vs. dense) and reduced predictive power of the dynamics model produces a phase transition where 
RWFl-DF is no longer able to correct large innovation values. 
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Fig. 5. CS recovery of the full Foreman video sequence. Each curve represents the rMSE for recovery from subsampled noiselets {MIN 
= 0.25) using the DT-DWT as the sparsifying basis. The independent BPDN recovery (dotted blue curve) and the independent re-weighted 
BPDN (solid green curve) retain a steady rMSE over time. RWLl-DF (dashed cyan curve) converges on a lower rMSE than either time- 
independent estimation and remains at approximately steady-state for the remainder of the video sequence. BPDN-DF (the dot-dash red 
curve) can converge to low rMSE values, but is highly unstable and can yield very poor results when the model is not accurate due to motion 
in the scene. 


B. CS recovery of natural video sequences 

To test the utility of RWLl-DF on natural signals, we explore its performance on a simulation of compressively 
sampled natural video sequences. These results will report in-depth comparison of a single challenging video 
sequence (the Foreman sequenc^ as well as aggregate statistics from a hatch of video from a BBC nature 
documentary (as used in p2}). The documentary footage is valuable as hroad comparison because it contains 
many different types of motion, including static frames with localized changes and highly dynamic frames with 
moving subjects across large portions of the visual field. 

In our simulation of CS video recovery, we take the time-varying hidden state Xn to be the wavelet (synthesis) 
coefficients at each frame of the video. While the true frame-by-frame dynamics of natural video are likely to be 
complex and non-linear, for this simulation we use a simple first-order model predicting that the coefficients will 
remain the same from one frame to the next: fk {x) - x for all k. Improvements to this simple model would improve 
any algorithm that can incorporate more advanced models, including RWLl-DF which allows arbitrary dynamics 
models. A benefit to assuming stationary dynamics is that it allows us to compare recovery performance with a 
number of existing algorithms that do not currently have arbitrary dynamics as part of the approach, including 
DCS-AMP 1531 and modCS | |4T| . To demonstrate the advantage of using the coefficient values instead of only 
support information, we also compare to a modified version of the binary-weighted BPDN algorithm described 


in 1481 where we have added support set prediction based on model dynamics. We call this approach weighted 
with prior information (WLIP). 


We use a four-times overcomplete dual-tree discrete wavelet transform (DT-DWT) |54| as the sparsity basis and 


we simulate CS measurements by applying a subsampled noiselet transform |55| to each frame. We take M - 

^The Foreman sequence is available at: http://www.hlevkin.com/TestVideo/foreman.yuv 
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Fig. 6. A comparison of RWLl-DF with existing recovery algorithms (DCS-AMP, modCS and WLIP) for the Foreman video sequence. 
Each curve represents the rMSE for recovery from subsampled noiselets {MIN = 0.25) using the DT-DWT as the sparsifying basis. 


0.25N measurements per frame (where N - 128^) with a measurement noise variance of We solve all 

optimization programs using the TFOCS package | [57| due its stability during RWLl optimization and the ability to 
use fast implicit operators for matrix multiplications. In the Foreman video sequence, we simulate CS measurements 
on a 128x128 pixel portion of the video, using parameters A = 0.001 for BPDN, Aq = 0.11, r = 0.2 and u - 0.1 
for RWLl, 7 = 0.01 and k - 0.2 for BPDN-DF, and Aq = 0.003, t - 4, (3 - I and = 1 for RWLl-DF. These 
parameters were found using a manual parameter sweep to optimize performance for each algorithm and number of 
CS measurements. While the number of measurements was fixed in this simulation for computational tractability, 
recovery performance could be altered for all algorithms by adjusting the number of measurements. 

Figure 1^ shows the recovery of 200 consecutive frames of video in the Foreman sequence. We see that RWLl- 
DF converges to the lowest steady-state rMSE and is able to largely sustain that performance over the sequence. 
In contrast, BPDN-DF cycles through periods of good performance and poor performance, sometimes performing 
worse than not using temporal information at all. In essence, BPDN-DF is not robust to model errors, and each 
time there is motion in the scene (violating the simple dynamics model) the algorithm has to re-converge. The 
RWLl-DF approach does not exhibit this performance oscillation because the use of second-order statistics to 
propagate temporal information is less rigid, allowing for more robustness during model errors. Figure shows a 
comparison of recovery for the Foreman video sequence across several existing algorithms, including DCS-AMP, 
modes and WLIP. Again we optimize algorithms parameters manually to achieve the best aggregate performance 
over the video sequence. To summarize the performance over the entire video sequence. Figure shows histogram 
plots of the rMSE values over the 200 frames for each recovery. The mean and median for each histogram are 

^We use the noiselet transform because it can be computed with an efficient implicit transform and has enough similarity to a random 
measurement that it works well in CS |56[. 
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BPDN 



rMSE 


RWL1 BPDN-DF RWL1-DF 




Fig. 7. Histogram of the rMSE for the compared algorithms when recovering the Foreman video sequence with the DT-DWT as the 
sparsifying basis. RWLl-DF achieves a lower mean (indicated by the dashed green lines) and median (indicated by the red arrows), with a 
tightly concentrated error distribution due to the robustness to model mismatch (producing few outliers). Specihc mean and median values 
are shown in Table |I] 



BPDN 

RWEl 

BPDN-DE 

RWEl-DP 

DCS-AMP 

WEIP 

modes 

Mean rMSE 

3.84% 

3.09% 

3.29% 

1.63% 

3.48% 

3.27% 

5.22% 

Median rMSE 

3.85% 

3.07% 

2.78% 

1.61% 

2.57% 

3.27% 

4.58% 


TABLE I 

Mean and median values eor compressive recovery of the Foreman video sequence. 


represented by the green dashed line and the red arrow respectively, and are listed in Table The recovery errors 
for BPDN-DF are significantly spread out, achieving nearly the same median error as the independent algorithms 
and having a much higher mean error due to the large excursions during model mismatch. BPDN-DF actually 
may be a worse choice than not using any temporal information at all when model errors are present. In contrast, 
RWLl-DF shows a much tighter distribution of errors, having a mean and median significantly lower than alternate 
approaches. 

In addition to the in-depth comparison on the single Foreman video sequence, we also perform the same CS 
recovery task on a database of video sequences from a nature BBC documentary to investigate the performance 
across a wider range of video characteristics (i.e., including video clips with localized motion and global motion 
in the scene). We simulated CS measurements for 24 sequences (48-frames each) in the same manner as the 
Foreman sequence and recovered the frames using the same methodology described above (including parameters 
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Fig. 8. Percent improvement of RWLl-DF over other algorithms for compressive recovery of video sequences (calculated as described in 
the text). Displayed is both the mean improvement (error bars indicating the normalized standard deviation) and the median improvement 
(error bars indicating the 25th and 75th percentile) for each algorithm, (a) Results for the full database of 24 sequences from a BBC nature 
documentary, (b) Results for the 13 sequences that were especially challenging for CS recovery, illustrating the benefits of RWLl-DF in this 
particularly difficult regime. 


optimized on the Foreman sequence). Figure shows the mean and median improvement of RWLl-DF relative 
to the other algorithms being evaluated. Specifically, the plotted mean improvement is the average of the rMSE 
difference between RWLl-DF and the comparison algorithm at each frame normalized by the average rMSE for 
the comparison algorithm across the whole video clip. The median improvement is calculated in the same manner. 

The recovery results for this video database show consistent performance improvements for RWLl-DF when 
evaluated over all video sequences in this database. Additionally, we note that some video sequences were sig¬ 
nificantly richer in texture and motion than others, resulting in a more challenging recovery task. We identified 
13 such video clips fhaf were especially challenging (i.e, those where the average rMSE reconstruction for BPDN 
is over 1%). For these clips we plot the mean and median percent improvement in Figure RWLl-DF shows 
very significant improvements within these video sequences, indicating that RWLl-DF is especially beneficial in 
challenging recovery scenarios. Additionally, an adaptation of a similar RWLl method to a spatial filtering setting 
has shown similar performance improvements in the most challenging data examples | [58| . 

VI. CONCLUSIONS 

In this work we explore two causal dynamic filtering schemes for time-varying sparse signals that are based on 
minimization approaches. We examine both a simple method (BPDN-DF) that admits a theoretical analysis as 
well present a novel algorithm (RWLl-DF) that achieves state-of-the-art performance by leveraging second-order 
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statistics in a similar manner as Kalman filtering. We conclude from these results that minimization methods 
have significant potential to he extended from static sparse signal estimation to time-varying data streaming settings. 
While not discussed in detail, the approaches in this paper can he solved efficiently through simple approaches 
such as an Iterative Shrinkage and Thresholding Algorithm (ISTA). Any future advances in numerical algorithms 
for optimization will allow for faster and larger scale implementations of these approaches. 

There are a number of potential modifications and improvements to the descrihed approaches that will he pursued 
in future work. First, while classic dynamic filtering approaches assume known dynamics, it would he natural 
to estimate or learn complex dynamics as part of the proposed model structure. Second, this paper follows the 
convention of classic dynamic filtering work hy assuming that the dynamics model operates in the state space 
(i.e., predicting the variances from the previous state estimates). In scenarios where the dynamics operations are 
not derived from physical principles and may he learned from data, RWLl-DF may he more effective when the 
dynamics model predicts the variances from the previous variance estimates. 


VII. Appendix 

A. Temporal bound derivation 

This appendix proves the result in Theorem III.1| While the results do not depend on the numerical algorithm 


used to solve the optimization, the proof technique will leverage ISTA in the analysis |16|, [471, |59|. This proof 
will follow hy using the ISTA update equation at each time-step n and algorithmic iteration I, along with the 
RIP condition on and Lemma |VII.I (presented helow), to obtain a recurrence relation for the absolute error 
between the current signal estimate. We then solve the recurrence relation for the analytical upper bound for each 
I, n independent of the previous ISTA iteration I + 1 (but still dependent on the previous time-step n — 1), allowing 
us to take I — )• oo and obtaining a recursive steady state bound in terms of only n. We then solve the new recursion 
equation in terms of n, yielding an upper bound for all n depending only on the initial condition and the properties 
of the dynamics and measurement functions. 

We start by noting that ISTA for BPDN-DF can can be written as 


= zi + ^W^(Gl{yn-GnWzi) 

+K (^f{WZn-l)-Wz^^)) 

(14) 


where u is an un-thresholded state, T.y(-) is a soft-thresholding function with threshold parameter 7, and rj is the 
algorithm’s step size. The proof continues by demonstrating that error between the estimate and the true signal 
contracts at each algorithmic step. Using the fact that the converged solution of ISTA minimizes the corresponding 
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cost function p7l , | [59l , the error bound on the ISTA solution is then also a bound for the overall BPDN-DF 
minimizer. For ease of notation, we omit temporal subscripts, assuming that all variables, unless otherwise stated, 
have temporal subscript n. Additionally, we define the previous (steady state) estimate as = z and refer to zh 
as the true coefficients at time n. We define fwo subsets of the sparse vector z: J and J'. The set J = J[l + 1] is 
the union of the current set of active coefficients in the ISTA algorithm r[(], the q largest elements of the vector u 


A[/], and the true active set Fp In 116| it is shown that | J| = \ A[l + 1] U r[f] U Fjl < 5 + 2q. Similarly we define 
J' = J[l + 2] = A[Z + 1] U r[/ + 1] U Fp To sfart, we require the following lemma which bounds the norm of u at 
each algorithmic step and at each n: 

2 


Lemma VII.1. Suppose that the same conditions as in Theorem 


III.l 


hold. Additionally, assume that 


zt 


< b 


for all n. The vector u^j (the ISTA variables restricted to the support subset J) at each algorithmic iteration I 
obtained via ISTA (iterating Equation ( |14| ) ) with a step size of p satisfies 

rjd 


ujh < I?? - 1| + 


1 + K J 1 + K 


T n , ( , 

s/q + [r] + V -^ 1 b 


K + 1 


sJl + (i_ VK _ 
+f/-r--e + ^ 


1 + K 


1 + K ’ 


And with the restriction 


7]{k + nf* + 1 + (5)6 + ps/l + (5e + rjnu 
|r/(l + k) - 1 | + r)(5) 


< 1 - 


1 + At 




then ||irj ||2 is simply bounded by UrtjH^ < ^y/q 

We start by writing the norm using the definition of u in the ISTA algorithm: 
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Using the fact that y = GWz^ + e. 
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(15) 


To properly reduce the portion of this expression depending on the dynamics /(•), we note that since the dynamics 
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satisfies Wzt = f{Wz1_-^^) + Un, 


fiWz) - Wz^ = f{Wz) - Wz^ + Wz^ - Wz‘ 
= f{Wz)-f{Wzi_,)-u 
+W (z^ -z^ 


Setting rj = T]/{I + k) and collecting similar terms, 




< 


z^j + ?j {{GW fjGW + kIj) i^z^ - z^ 
+i?Vr'^ (Gje - Ku) 
+?jKWj[fiWz)-f{Wzi_,))\\^ 

z^j + ^{{GWfjGW + kIj) - z^ 

+77\/l + (5 ||e||2 + ^ k \\ u \\2 


+riK 


Wj [f{Wz) - fiWzi_, 


< 


< 


z^j + 7j{iGW)^GW + kIj) (z^-z^ 


+t/VT+~ 5 ||e ||2 + r/K II 1 /II 2 + rjnf* ||e„_i ||2 
{?i{GWfjGW-{l-K7i)Ij)z^j 


+ 


r]{G'jG + kIj)Wz'' +r7VT+~5||e 


+r]K\\u \\2 + t/k/* ||e„_i| 


2 ■ 


(16) 


where the first and third inequalities follow from the triangle inequality, the fact that ||W^|| < 1 and the RIP of G, 
and the second inequality follows from the smoothness of /(•). 

We now use the Cauchy-Schwartz inequality with the fact that, since G satisfies RIP(| J|, 6) with respect to W, 


\aiWG)'^{GW)j + /3Ij\L <\a + /3\+ ad, 


for any constants a, (3, to hound the first two terms of Equation [T^ as 


(17) 




< dry + Kr/- 1| + r?(5) z' +r]{K+l + 6) 


+r/VT+~5 ||e ||2 + ?7K II 1 ZII 2 + rjKf* ||e„_i ||2 


Simplifying, we obtain 




< (|r/(l + k) - 1| + T/(i) 7 y^ +7/(k + 1 + 5)6 
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+r/\/l + <5e + riKiy + rjKf* 


l^n—1| I2 ? 


where h is the maximum energy of (i.e. ||2:^||2 < h') and 7 = 7/(1 + k). 

Ideally, this Lemma should he independent of the previous estimation error norm ||e„_i||2 for the Lemma to hold 


for all n. If we initialize the estimate with the zero vector, clearly 11 eg 112 = 




< b. Thus, setting ||e„_i||2 < b 


results in the Lemma statement and it remains to ensure that | |e„| I2 < 6 for all n through later parameter restrictions. 


B. First Recursion 


With Lemma VII.l we seek a recursive expression for the estimation error at algorithmic iteration I + 1, 
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< 7\/9 + I|en-i|l2 + + 5e + r]KU 

+{\r}+rjK- 1 | +rj 6 )\\e^\\ 2 , 

where the first inequality follows from the triangle inequality, the second inequality follows from the nature of the 


thresholding function and the definition of u, the third inequality follows from Lemma VII. 1 for the first term and 


a similar set of steps as used to prove Lemma VII. 1 used on the second term. This recursive formula can now he 
solved for | |e^| I2, 

||e[f]||2 < (|7-l|+^'5)'(||e°||2 

ly/Q + VK-f* I|en-l|l2 + + <^6 + rjKu \ 

1 — \r] — l\ —rjS j 

^7y/Q + Vf^f* I|en-l|l2 + 


As I 


1 — \r] — 1\ — r]6 
oo, ISTA converges to a solution with a hounded error of 
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^ 1\/Q+ 111^1* I|en-i|l 2 + r]Vl + Se + rjKL' 

~ I — \r] — 1\ — rj6 


C. Second Recursion 


The steady state error of ISTA can now be treated as a new recursive equation on 11 e„ 11 2 in terms of the signal 
time-step n. This expression can be solved for ||e „||2 

^ , rjnf* 

in\\2 S 


1 — \ri — 1\ — r]6J ^ 

ly/Q + ^\/TT5e + \ 

1 — \r] — 1\ — rj6 — rjnf* j 

ly/Q + rjVT+~6e + rjnv 
1 — \r] — 1\ — rj6 — rjnf* ’ 


which yields a bound for the error at every iteration n. 


holds. If h bounds ||e „||2 at each 

n, then 


The only remaining task is to ensure 11 


I2 — 


for all n so that Lemma 


VII. 1 


holds. Simplifying, 


/5"(l|eo|l2 


a , a 
1-/3^ ^ 1-/3 


< /3"(6 


a . a 
1-/3^ ^ 1 


< b 


a(l-/3”) 

1-/3 


<(l-/3>^^<6, 


or in terms of the model and system parameters, 

(1 + k) 7\/Q + ^\/l + be + r]KV 
(1 + k )(1 — \r] — 1 |) — r]5 — rjnf* ~ 

Setting T/ = (1 + (3)“^ < 2(1 + <3)“^ completes the proof. 
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